Context-based generic cross-lingual retrieval of documents and automated summaries

نویسندگان

  • Wai Lam
  • Ki Chan
  • Dragomir R. Radev
  • Horacio Saggion
  • Simone Teufel
چکیده

trieval model that can deal with different language pairs. Our model considers contexts in the query translation process. Contexts in the query as well as in the documents based on co-occurrence statistics from different granularity of passages are exploited. We also investigate cross-lingual retrieval of automatic generic summaries. We have implemented our model for two different cross-lingual settings, namely, retrieving Chinese documents from English queries as well as retrieving English documents from Chinese queries. Extensive experiments have been conducted on a large-scale parallel corpus enabling studies on retrieval performance for two different cross-lingual settings of full-length documents as well as automated summaries.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of Text Summarization in a Cross-lingual Information Retrieval Framework

We report on research in multi-document summarization and on evaluation of summarization in the framework of cross-lingual information retrieval. This work was carried out during a summer workshop on Language Engineering held at Johns Hopkins University by a team of nine researchers from seven universities. The goals of the research were as follows: (1) to develop a toolkit for evaluation of si...

متن کامل

English-Persian Plagiarism Detection based on a Semantic Approach

Plagiarism which is defined as “the wrongful appropriation of other writers’ or authors’ works and ideas without citing or informing them” poses a major challenge to knowledge spread publication. Plagiarism has been placed in four categories of direct, paraphrasing (rewriting), translation, and combinatory. This paper addresses translational plagiarism which is sometimes referred to as cross-li...

متن کامل

A system for supporting cross-lingual information retrieval

In this paper, we present the system MULINEX, a fully implemented system which supports cross-lingual search of the WWW. Users can formulate, expand and disambiguate queries, filter the search results and read the retrieved documents by using only their native language. This multilingual functionality is achieved by the use of dictionary-based query translation, multilingual document categorisa...

متن کامل

The Future of Multilingual Summarization: Beyond Sentence Extraction

In this paper I present a vision for the future of multilingual summarization that focuses on summarizing differences between documents: generating sentences that explain the main points of controversy in the document set, identifying different sides in the dialogue and the claims they support, and identifying how content differs across document boundaries (cultural, national, political, etc.)....

متن کامل

An evaluation framework for cross-lingual link discovery

Cross-Lingual Link Discovery (CLLD) is a new problem in Information Retrieval. The aim is to automatically identify meaningful and relevant hypertext links between documents in different languages. This is particularly helpful in knowledge discovery if a multi-lingual knowledge base is sparse in one language or another, or the topical coverage in each language is different; such is the case wit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • JASIST

دوره 56  شماره 

صفحات  -

تاریخ انتشار 2005